Reaching the precision limit with tensor-based wavefront shaping

Perturbations in complex media, due to their own dynamical evolution or to external effects, are often seen as detrimental. Therefore, a common strategy, especially for telecommunication and imaging applications, is to limit the sensitivity to those perturbations in order to avoid them. Here, instead, we consider enhancing the interaction between light and perturbations to produce the largest change in the output intensity distribution. Our work hinges on the use of tensor-based techniques, presently at the forefront of machine learning explorations, to study intensity-based measurements where its quadratic relationship to the field prevents the use of standard matrix methods. With this tensor-based framework, we can identify the maximum-information intensity channel which maximizes the change in its output intensity distribution and the Fisher information encoded in it about a given perturbation. We further demonstrate experimentally its superiority for robust and precise sensing applications. Additionally, we derive the appropriate strategy to reach the precision limit for intensity-based measurements, leading to an increase in Fisher information by more than four orders of magnitude compared to the mean for random wavefronts when measured with the pixels of a camera.

1 Derivations for the Fisher information expressions

Fisher information for intensity measurements
Given a random process determined by a probability mass function ( |) for the set of random variables  and with parametric dependence on , the Fisher information for the parameter  is defined as where the overline denotes the expected values over all the possible values of  [1,2].Here, ( |) represents the inaccuracy of the measured intensity distribution over the  output modes {y 1 , . . ., y  } due to the presence of noise.Thus, the measured intensity distribution is given by  = I +  where I = |e (out) | 2 is the expected intensity distribution and  represents the noise which is assumed to be Gaussian with known standard deviation   for the  th output mode.The probability mass function can then be written as The intensity value of the n th output mode is defined as   = | (out)

𝑛
| 2 where  (out)    is the coefficient of the n th output mode in the linear expansion e (out) =   (out)  y  .The Fisher information then takes the following simple form (S3)

Fisher information as a rank-one approximation
Writing the output field in terms of the matrix H and the input field, which is decomposed in terms of the input modes {x 1 , . . ., x  }, e (in) = =1  (in)  x  , and normalized, ||e (in) || = 1, we get that (H   H *  ) (in)    (in) where W (3)     = H *   H  is a third-order tensor encoding the nonlinear relationship between the input field and the changes in intensity induces by first-order variations of the parameter .For simplicity, we have taken the noise over the output modes to be the same,  =   for all , but, if needed, these output modedependent noise variations can be incorporated in the definition of W (3) .It is possible to keep rewriting this expression for the Fisher information to get a more insightful expression, where we introduced the real vector a defined component-wise as This real vector a is proportional to the rate of change of the intensity distribution over the output modes with respect to the parameter  for the input field e (in) .Its value is fixed as the one that maximizes J for a fixed e (in) .It is easy to show that the ∥a∥ = J 1/2 () so that substituting a by ã = a/∥a∥ in the expression for the Fisher information we get, Here, ⟨. . ., . ..⟩ denotes the inner product of two tensors as defined in Eq. (S19).Finally, by squaring both sides of Eq.S7b, the Fisher information can be rewritten as Given that ∥ ã∥ = ||e (in) || = 1, this expression can almost be seen as a rank-one approximation of the third-order tensor W (3) [3].The difference lies on the fact that ã depends on the input field e (in) and the tensor we are trying to approximate W (3) .
Let us instead consider the cost function where ∥u∥ = ∥v∥ = 1, and u is independent of v.This cost function has more degrees of freedom than J , and is easy to show that C =  2 J if for e (in) = v.Now, if we find u and v such that C is a maximum then it is easy to verify that u is related to v according to Eq. (S10) and Moreover, since J lives in a smaller subspace than C, at these points J is also at a maxima for e (in) = v.This demonstrates that the input field that maximizes the Fisher information can be found by computing the best rank-one approximation of W (3) .

Changing the output projection modes
To project the output field onto a set of  orthonormal output projection modes (OPM), we simply apply the corresponding projection operator P to the output field.The projection operator satisfies P † • P = 1  where 1  is the identity matrix of dimension .Note that P is generally not unitary since  can be less than the total number of output modes in the system.The projected output field can then be written as e (out) where p  is the  th column of P, i.e. the  th OPM, and the intensity distribution over the new OPM is given by  ( )  = | (out) , | 2 .Rewriting the Fisher information in terms of an output modes basis that contains all the information about the output field we have that, where It is easy to see that the E  operator is Hermitian and of rank two.Therefore, it only has two non-zero eigenvalues, which means that all the information available in e (out) can be obtained by projecting into two modes.Note that this result applies even to changes happening at the source and not only those encoded in the matrix H.

Fisher information for optimal input-output combinations
Now, if we want to find the optimal input-output combination in order to reach the precision limit for intensity-based measurements, then we need to write the Fisher information in terms of the input field.Moreover, as we just showed, we only need to consider projection the output into two OPM.Therefore, setting e (out) = H • e (in) and  = 2 in Eq. (S13b), we get where we defined the fourth-order tensor W (4) whose components are given by Another way to understand the dependence of the Fisher information on the OPMs is by noticing that the tensor W (3) does not follow the standard transformation rules when the output basis is changes.This is in part due to the fact that the output space corresponds to changes in intensity which is a different physical quantity than the output field.For this reason, when including the OPMs in the optimization it becomes necessary to introduce the fourth-order tensor W (4) which does follow the standard transformation rules under changes of the output basis.

Finding the best rank-one approximation 2.1 Best rank-one tensor approximation
Let us consider a tensor T ∈ C  1 × • • • × C   of order .The best rank-one approximation of T is defined as the rank-one tensor that solves the following minimization problem min u (1) It is possible to show that this minimization problem is equivalent to the following maximization problem (see Ref. [3] for proof), max u (1) where ⟨• • • , • • • ⟩ denotes the inner product induced by the Frobenius norm, namely ⟨T (1) , Finding the best rank-one approximation is a nonlinear problem for which finding the optimal solution is not as straightforward as it is for matrices.

Input
T tensor of order L

Output
[U (1) , . . ., U () ] list of matrices containing the higher-order singular vectors S core tensor Procedure 1: S ← T ⊲ initialize the core tensor 2: for  ← 1 to  do 3: ⊲ computes singular value decomposition 5: U () ← U ⊲ assign the left singular vectors as the higher-order singular vectors 6: ⊲ update core tensor by computing the n-mode product 7: end for

Higher-order singular value decomposition.
A first step into the right direction for finding the best rank-one approximation can be taken by computing the HOSVD which is defined in what follows.A tensor of order  can be decomposed as a sum of rank-one tensors using a higher-order version of the SVD, where the higher-order singular vectors u form an orthonormal basis of C   , S  1 •••  are the entries of the core tensor S, and ⊗ denotes the outer product.The core tensor satisfies the properties of 1. all orthogonality: where S   = denotes the tensor obtained by setting the  th index equal to .For matrices, these properties reduce to the diagonality condition of the matrix of singular values and their ordering.
Another important property of the HOSVD is that it preserves the partial symmetries of the original tensor, for example if The biggest advantage of the HOSVD against other types of tensor decompositions is that it can be easily computed via the SVD of its matrix unfoldings (see Algorithm 1 and Refs.4, 5 for more details).However, when truncating the HOSVD to a single term we do not get the best rank one approximation, but we do get an excellent first guess.

Alternating least-squares algorithm
Finding the best rank-one approximation is a particular case of more general types of lower-rank tensor approximations based around the polyadic and Tucker decompositions [6,7].There are several algorithms

T
tensor of order L [u (1) , . . ., u () ] initial guess for the best-rank-one approximation P symmetries of the tensor   number of iterations

Output
[u (1) , . . ., u () ] list of vectors providing the best-rank-one approximation Procedure ⊲ get the order of the tensor 2: for  ← 1 to    do 3: ⊲ solve least squares minimization 5: end for 6: [u (1) , . . ., u () ] ← symmetrize([u (1) , . . ., u () ], P) ⊲ impose symmetries 7: end for that have been developed to find them and that could, in principle, be applied to solve the optimization problem in Eq. (S17).Here, we use the iterative alternating-least squares (ALS) algorithm and use as a starting point the first higher-order singular vectors.For the ALS algorithm, instead of tackling the minimization over a, b, and c all at once, we iterate over each vector by solving the standard least-square problem that results from leaving the other ones fixed (see Algorithm 2).For example, we start by solving for which we can find the exact solution.Then, we do the same for b and c.This is repeated for a predefined number of times until convergence is reached.One issue that can arise from the solution is that it does not satisfy the symmetries of the original tensor.Here, we were able to adapt the ALS algorithm in order to keep the symmetries of the original tensor if needed.Let us consider the case of the third-order tensor W (3) which has a partial Hermitian symmetry in the last two indices, W (3)     = W (3) *   , where we seek the vectors a and b that solve This simple adaptation gives enough freedom so the algorithm converges and provides a solution with the desired symmetry.A similar approach is taken to compute the rank-one approximation of W (4) .
3 Including the output projection modes in the Fisher information optimization

Finding the optimal output projection modes
As demonstrated in Sec.1.3, for a fixed input field the optimal OPMs are given in terms of the Hermitian matrix, where e 1 = e (out) / e (out) and e 2 =   e (out) /   e (out) , and  = e (out)   e (out) .In order to find these optimal OPMs, we first need to write this matrix in terms of an orthogonal basis.Moreover, since due to its rank only two elements suffice.Using the normalized output field e 1 and the normalized orthogonal component of e 2 we have that, where with   and   being the real and imaginary parts of , respectively.The two non-zero eigenvalues can then be found to be given by where it is easy to see that  + ≥ 0, and the corresponding normalized eigenvectors are with Now, the Fisher information in Eq. (S13c) can be rewritten as The optimal OPMs need to be given by a linear combination of the eigenvectors and be orthogonal between them.Therefore, we can write them as, Substituting these expressions into the Fisher information we get, which is a periodic function with period /2.Taking the first and second derivatives with respect to , we get the Fisher information attains it maxima for  = 0, thus showing that the optimal OPMs are the eigenvectors of E  , For small perturbations and unitary systems in which the perturbation does not induce losses, it can be shown that   = 0.In this case, the eigenvectors are given by the symmetric and antisymmetric combinations of the output field e (out) and the orthogonal component of the derivative   e (out) .

Finding the optimal input-output combination
The first step in finding the optimal input-output combination is to compute the fourth-order tensor W (4) .Given the number of pixels used as output modes to measure H, the size of this tensor would be too large to be useful for computations.However, since the number of input modes is much lower, one can see that using all the 1444 output pixels might be an overkill.Therefore, we first compute the SVD for the two TMs that we used to compute W (3) and use their output singular vectors to form an orthonormal set of output modes.This new set of output modes has double the number of input modes used, but it allows us to capture all the information encoded in the pixel basis but at a much smaller size.Then, we project the TMs onto this set of output modes and use this projected TMs to compute W (4) .
As mentioned in the main text, when trying to find the optimal input-output combination that allows reaching the precision limit, the Fisher information given by Eq. ( 6) of the main text no longer takes the form of a lower-rank tensor decomposition.Nonetheless, it is possible to rewrite it as a nonlinear optimization where only the expansion coefficients of the input field in terms of the input modes are used as optimization parameters.This is done by first computing the output field using the TM, then getting the optimal OPMs and using those to compute the Fisher information.This optimization was implemented using the neural network framework PyTorch.Here again, we need to provide a first guess, and, as for the ALS algorithm, we chose to use the first term of the HOSVD of W (4) .

Experimental characterization of an MMF under perturbation 4.1 Experimental setup
The optical setup is represented in Fig. S1.The light source consists of a continuous linearly polarized laser beam at 1550 nm (TeraXion NLL) injected into a 10:90 polarization-maintaining fiber coupler (PNH1550R2F1) in order to separate it into the shaped signal field and the reference field.The 90% arm is collimated and expanded to illuminate a digital micromirror device (DMD) (Vialux V-650L) which is used to modulate the signal field in amplitude and phase using Lee holograms [8,9].The light is converted into left circular polarization using a quarter-wave plate.The shaped signal field is then imaged with a 4f system onto the input facet of a 25 cm-long step-index fiber with a 50 µm core and 0.22 numerical aperture, which is held approximately straight.The output facet is imaged via another 4f system onto an InGaAs camera (Xenics Cheetah 640-CL 400 Hz) after passing through a quarter-wave plate, followed by a beam displacer to select the left-circularly polarized component.The other 10% arm is used to produce a tilted reference that is made to interfere with the signal field in order to be able to retrieve the output field via off-axis holography [10].An Arduino-controlled shutter allows blocking the reference field to perform intensity measurements of the signal field.The fiber can be deformed, roughly in the middle, by pressing on it using a 50 nm precision dc servo motor actuator (Thorlabs Z812).All the values reported for the deformation correspond to the linear displacement of the actuator and not the deformation of the fiber core, which is much smaller since most of the deformation is absorbed by the coating of the fiber.

TM and W (3) measurements
First, we measure the TM of the MMF in the pixel basis, by sending 7200 square layouts consisting of 37 × 37 square macropixels whose value is either zero or a random phase of amplitude one.Each macropixel is formed by grouping 16 × 16 pixels of the DMD, where the desired phase and amplitude value is encoded via Lee holograms [8] and selecting the first order of diffraction at the Fourier plane.The corresponding output fields are recovered from the interferograms between the reference and signal fields, and subsequently projected onto a square pattern of 44 × 44 macropixels formed by grouping 4 × 4 pixels of the camera.Regrouping all input and output fields into the columns of matrices X and Y, respectively, we reconstruct the TM via H = Y • X −1 where X −1 denotes the pseudoinverse of X.
Then, we compute the SVD of the TM in the pixel basis and use it to identify the fiber modes as the singular vectors with singular values in the almost constant plateau (see Fig. S2).For the fiber we used there were 144 fiber modes as opposed to the 129 predicted through numerical simulations.This indicates that some cladding modes are sufficiently close to the cutoff that they propagate without undergoing significant absorption for the short length we are using.All subsequent TM measurements are performed by sending 1440 random inputs obtained by randomly superimposing all 144 fiber modes.This allows us to drastically reduce the size of the TM and the higher-order tensors without losing significant information.The output fiber modes are simply the outputs generated by sending the fiber modes as inputs.These outputs are orthogonal since they correspond to the first 144 output singular vectors of the TM measured in the pixel basis.To determine the third-order tensor W (3) , we measure two TMs, H (±) , for two different values of the deformation,   ± /2, centered around the reference value   .Then, these two measurements can be used to approximate the derivative with respect to  using finite differences so that W (3)     = A similar approach is taken to construct the fourth-order tensor W (4) .

Noise characterization
Before performing the experimental estimation of small deformations, we need to verify the veracity of the assumption of Gaussian noise made in this work for our setup.For this, we performed a set of characterization measurements for two values of the input power and the four fields used to perform the estimation.The first one was performed at the highest input power possible without saturating any pixels of the camera and taking care to send the same amount of energy for each input field.The second one was performed using only 20% of the total power used for the first one.For each set, we sent 500 copies of the four fields and used the measured output intensity distributions to compute the mean and standard deviation value of the intensity over the output pixels as shown in Fig. S3.At higher power we can clearly see some intensity-dependent behavior in the standard deviation which is used to characterize the noise, particularly for the MIIC for which the focal spot is clearly visible.At lower power, however, this dependence on the intensity for the noise is almost completely gone with the   values being fairly uniform.
To confirm the assumption of Gaussian noise, Fig. S4 shows the quantile-quantile plot for the measured intensity deviation from its mean value including every pixel, realization and measured field when compared to a Gaussian distribution for high and low intensity values.This plot compares the cumulative distribution functions obtained from the theoretical Gaussian distribution and the data.If the data follows the model then the points obtained (shown as blue dots) should follow a straight line at 45 • (shown as a continuous red line).For the high intensity data, we see a clear deviation for positive values where the points clearly take larger values than those predicted by a Gaussian distribution.This means that the distribution for the data has a positive heavy tail which is consistent with the intensity dependent noise that we see in Extended Data Figure 3.However, this deviation disappears at lower intensity values where all the points follow closely the predicted diagonal line, thus demonstrating that our assumption of Gaussian noise is valid.

Estimating the deformation
Given that the perturbations under consideration are small, it is sensible to assume a linear model for the measured intensity distribution  over the output modes after a change Δ in the perturbation.Explicitly, we have that where I(  ) represents the output intensity distribution over the output modes prior to changing the deformation, I  (  ) is the derivative of I with respect to  and evaluated at   , and () is a vector of normally distributed independent random variables with zero mean and standard deviation .For this linear model the minimum variance unbiased estimator [1] is given by In this model, both the reference intensity distribution I and the derivative   I are assumed to be known.To calibrate the derivative, we first set the input power as high as possible without saturating any pixels for any of the fields used for the estimation.Then, we perform 500 independent measurements for each field and for two deformations centered around the reference value   .Then, the derivative of the intensity distribution of each field is estimated by taking the difference between the mean for each deformation and dividing by the total deformation performed between the two points.As can be seen in Fig. S5, the distribution of the mean change in intensity obtained from these calibration measurements is quite smooth, even though the effect of noise is noticeable for each individual realization, and with its main values above the noise level shown in Fig. S3 for all the fields.Also shown in Fig. S5 are the distribution of the mean change in intensity and a single realization at the lower input power used and the same deformations used for the calibration.Here, we can see that each individual change is quite susceptible to noise and that its effect is even noticeable for the mean distribution, particularly for the worst performing field given by the random wavefront.This shows that the calibration at high intensity is only necessary in order to isolate the effects of the inherent sensibility of the input field to the perturbation and those arising from a poor calibration, particularly for the random input for which it is simply not possible to accurately estimate the derivative at a low intensity.For the reference intensity distribution, we can use the same input power used for the estimations since the mean intensity distribution is well above the noise level for all fields (see Fig. S3).Moreover, using the same input power for the reference as for the estimation allows the nonuniform background to be automatically removed.
6 Moving the perturbation along the fiber length

Fisher information as a function of the location of the deformation
It is interesting to study the dependence of the Fisher information on the specific location of the perturbation.To study this dependence we consider the following model.Given that the deformation is applied at a particular point along the fiber, we can divide our system, and thus the TM, into three part: where H 1 represents the transmission of light up to the region that is deformed, D represents the propagation through the perturbation, and H 2 the propagation through the final stretch following the deformation.Propagation through small stretches of an unperturbed fiber of length Δ can be modelled by the diagonal matrix, P(Δ) = exp(iΔ), where  is the diagonal matrix containing the propagation constants of the modes of the fiber.Therefore, one way to model the displacement of the deformation is by applying the operator P(Δ) representing the axial shifts to reduce the propagation distance on one side and increase it on the other side Given that the length of the fiber is fairly small and the deformation not too strong, the matrices H 2 and H 1 are mainly diagonal matrices (see e.g.Refs.[11,12]).Therefore, it is safe to assume that H 2 and H 1 commute with P() thus giving With this model we can compute how the Fisher information evolves as we change the location of the perturbation for the various channels that were derived in the main manuscript for the location Δ = 0.The results are shown in Fig. S6.These plots show that the optimal fields found at Δ remain quite sensitive within a range of 0.5 mm.Depending on the application, this can be more than enough to account for inaccuracies during its implementation.These results are perfectly suited to enhance the precision of point sensors.This type of sensor is designed for a specific parameter and at a specific location, which is what we consider.In this case, a significant drop in the Fisher information as we move away from the region of interest would be beneficial for these sensors since it would make them less sensitive to perturbations happening in other parts of the system.Fig. S6 | Changing the location of the perturbation.a-b, Evolution of the Fisher information with respect to the location of the perturbation for one-thousand random inputs (continuous line show the mean and the shaded region the full range), the maximum value of the fiber modes, the maximum value of the third-order ISVs, the MIIC, the channel using a single output projection mode, and the channel allowing reaching the precision limit by tailoring both the input and output.Note that for each value Δ the mean value for the random inputs has been used to rescale the others to simplify the comparison.Two different ranges for Δ are shown, a large range of 2 cm (a) and a smaller range of 2 mm (b).Fig. S7 | Blind a-c, Histogram and mean value (vertical lines) of the Fisher information for 1000 random inputs when projecting the output field onto the pixel basis, the fiber modes, and the output higher-order singular vector.The number of modes used at the output for the fiber modes and the fourth-order OSVs is a, 30, b, 10, and c, 2.
If, instead, the goal is to develop a distributed sensor which can detect perturbations with high sensitivity along the full length of the fiber then the solutions shown in the main manuscript will not be optimal.Nonetheless, our formalism can help address this problem as well.From Fig. S6 it can be seen that the third-order ISV actually provides us with a channel that remains more sensitive than the fiber modes even for large variations of the position of the deformation.This shows that it should be possible to find an input field that is optimal for sensing perturbations irrespective of their location and that the HOSVD provides us again with an excellent starting point, showing again the usefulness of the techniques we are introducing.

Optimal blind estimation output projection modes
Another thing that can be noted from Fig. S6 is that the channels for which the OPMs have been optimized exhibit larger variations in their Fisher information as a consequence of their specificity to the deformation and specific output field.Much less noticeable variations are present for the channels using the pixel basis as OPMs because this basis is not optimized for any field in particular.These features point to another strategy for dealing with distributed perturbations in which optimal OPMs that do not depend specifically on the input field need to be found.Using the fourth-order output singular vectors (OSVs) of the fourthorder tensor W (4) as OPMs, it is possible to boost the precision for any input field on average.Figure S7 shows the histograms and mean values obtained when computing the Fisher information for one thousand random inputs when the OPMs are the pixels of a camera, the modes of the fiber, and the fourth-order OSVs.
The benefits of using other OPMs compared to the pixels of the camera become obvious.Even when just 10 of the fiber modes or the fourth-order OSVs are used, they significantly increase the mean value Fig. S8 | Blind estimation with changing location.Mean value of the Fisher information obtained for onethousand random input fields using as OPMs the pixels of a camera, the fiber modes and the fourth-order OSVs.For the pixel basis the shaded region indicates the range obtained and for the fiber modes and the fourth-order OSVs three different curves are shown for each, the continuous line uses 30 elements, the dashed line uses 10, and the dotted line only 5.
with respect to that obtained with the pixel basis.In particular, the fourth-order OSVs exhibit a much higher mean Fisher information than the one obtained with the fiber modes, and remain a better candidate even when only two of its elements are used as OPMs.Similarly to what we did before, it is also possible to study their behavior as we change the position of the deformation.The results are shown in Fig. S8 where it can be seen that while there is still a clear dependence on the specific location of the deformation, the fourth-order OSVs provide a better alternative to the fiber modes and the pixels of a throughout a large displacement of the perturbation.

∥
T − a ⊗ b ⊗ b * ∥. (S22) Initially, we do not force the symmetry and allow the last vector to be different, say c and start by minimizing ∥ T − a ⊗ b ⊗ c∥.We proceed as a standard ALS routine by fixing b and c, and solving the minimization problem min a ∥ T − a ⊗ b ⊗ c∥, (S23) which is a standard least-squares problem with a fixed solution.Then, we do the same for b and c.Before performing another loop, we symmetrize the lower rank tensor by setting

Fig. S1 |
Fig. S1 | Optical setup.DMD, digital micromirror device; L i , lenses; QWP, quarter-wave plates; M, mirror; D, diaphragm; BD, beam displacer; BS, beam splitter.The shutter is used to block the reference beam in order to perform intensity measurements.

Fig. S2 |
Fig. S2 | Identifying the fiber modes.Singular values of the TM measured in the pixel basis.The fiber modes are identified as the singular vectors for which the singular values light in the plateau.The cutoff value is marked by the vertical red line.

Fig. S3 |
Fig. S3 | Noise characterization.a-d,i-l, Mean intensity distributions and e-h,m-p, standard deviations for the four fields used for the parameter estimation at high-intensity values (a-h) and when only 20% of the light is sent (i-p).

Fig. S4 Noise
Fig. S4 Noise QQ plot.a-b, Quantile-quantile plot for the measured intensity deviation from its mean value including every pixel, realization and measured field when compared to a Gaussian distribution for high (a) and low (b) input intensity values.

Fig. S5 |
Fig. S5 | Change in intensity.a-d,i-l, Mean and e-h,m-p, single change in intensity distributions for the four fields for the measurements taken to calibrate the derivative and acquired at high-intensity values (a-h), and the same ones but using only 20% of the light sent for the calibration (i-p).